Complete Linux

home *** CD-ROM | disk | FTP | other *** search

/ Complete Linux / Complete Linux.iso / docs / apps / database / postgres / postgre2.z / postgre2 / doc / implementation / planner / 4.imple < prev next >

Wrap

Text File | 1992-08-27 | 11.8 KB | 285 lines

.sh 1 "IMPLEMENTATION ISSUES" 4 .pp This section does not attempt to describe in detail the actual implementation of the POSTGRES optimizer. Rather, it focuses on important decisions made in building the optimizer. .sh 2 "Choice of Language" .pp Query optimization requires a great deal of element manipulation. The optimizer must separate, modify, and regroup elements within a query's target list and qualification to create new components for individual nodes in the plan tree. A programming language like LISP is well-suited for this type of processing because the language contains functions and data structures specifically designed for object manipulation tasks. Consequently, for ease of implementation, we chose to write the POSTGRES query optimizer in LISP. .pp Franz LISP, Opus 42 [FRAN85] was the selected LISP dialect. It was chosen because it was readily available and also because it supports a foreign function interface. A foreign function interface allows LISP code to call routines written in other programming languages. This feature is of utmost importance because the optimizer must call C routines to access information from database system catalogs, given that the access method code in POSTGRES is being written in C. .pp Franz LISP, Opus 42 is also fairly compatible with CommonLISP [STEE84], an emerging standard LISP dialect. Therefore, in the future, if translation from Franz to CommonLISP is necessary, this will require minimal effort. .pp In general, compiled LISP code executes less efficiently than compiled C code. Therefore, an optimizer written in LISP will execute more slowly than an optimizer written in C. This, however, is not a problem. As discussed in section 3.1, POSTGRES compiles query plans and caches, for later use, plans and tuples resulting from query preexecution. Because of these two features, a single query plan produced by the optimizer may be used several times. As a result, the cost of optimization is amortized over several executions. This significantly reduces a query's planning cost, yielding a figure that is minimal relative to the overall cost of execution. Therefore, in terms of optimizer efficiency, the choice of language is not a major concern. .sh 2 "Representing Query Plans in LISP" .pp In general, the cost of query processing constitutes the most significant portion of a query's execution cost. Therefore, the query processor must execute as cost efficiently as possible. To meet this goal, every node in the plan tree is constructed using one-dimensional arrays. These are known as \*(lqvectors\*(rq in LISP. Each element within a vector corresponds to some property of a node. By indexing appropriate vector entries, all properties can be accessed in constant time. .pp Among the properties within each plan node are the left and right subtrees of the node, target lists, and qualifications. The left and right subtrees either point to another plan node or nothing (\fBnil\fR in LISP). The target list and qualification entries respectively point to a list of individual target list elements and a list of restriction clauses. Lists are used to represent these structures because both sets of items are variable length, and random access to individual entries within these lists is not required. Each target list item consists of two items, also grouped together in a list. The first item in the list is a \*(lqresdom\*(rq node. It contains information about its corresponding target list entry \*- its type, destination, and if relevant, sort or hash information. Each resdom node is implemented using a vector. The second element, an \fIexpr\fR, is an arbitrary arithmetic expression consisting of variables, constants, parameters, functions, and operators. Each of these subcomponents is also a vector, and these vectors are linked together in a list if they represent the arguments to a particular operation or function. A restriction clause is a boolean \fIexpr\fR; therefore the preceding description applies to qualifications as well. .pp In addition, every plan node contains an empty slot that the query processor uses to store runtime-specific query information. Figure 4.1 shows the internal representation of a query plan that accesses two attributes and processes a single join clause using nested iteration. .(z .hl .GS file figure4.1 .GE .sp .ce 2 \fBFigure 4.1\fR Internal representation of a nested iteration join node .hl .)z .pp Constructs analogous to records in Pascal and \*(lqstructs\*(rq in C are used to build the different vector types associated with each node type. These constructs are called \*(lqdefstructs\*(rq in LISP. With defstructs, LISP programmers can combine primitive data types to create structured items. These new data structures, in turn, can be combined and nested to create even more complex structures. After defining a defstruct, LISP automatically provides a set of routines to dynamically create objects of these structured types and to access information within a structure. As a result, although a vector is the underlying data type used to implement defstructs, users can access node properties by specifying field names, as opposed to indexing vector entries. Figure 4.2 shows a Franz LISP defstruct definition and associated routines for a nestloop node. .(z .hl .(l .sz \n(ep .ft I ; Plan information common to all plan nodes .ft R (\fBdefstruct\fR (plannode (\fB:conc-name\fR get_)) (state \fBnil\fR) targetlist (lefttree \fBnil\fR) (righttree \fBnil\fR)) \fI; initialize lefttree, righttree, and state to nil\fR .ft I ; Nestloop node .ft R (\fBdefstruct\fR (nestloop (\fB:include\fR plannode) \fI; node contains defstruct defined above\fR (\fB:conc-name\fR get_) (\fB:constructor\fR make_nestloop (targetlist qual lefttree righttree))) (nodetype \*(lqNESTLOOP\*(rq) qual) .sp .ft I ; ; LISP routines provided as a result of the above definitions: ; ; Routines to retrieve property fields: (\fBget_state \fRnode) (\fBget_targetlist\fR node) (\fBget_lefttree\fR node) (\fBget_righttree\fR node) (\fBget_nodetype\fR node) (\fBget_qual \fRnode) \fI; Routine to construct a nestloop node:\fR (\fBmake_nestloop\fR targetlist qual lefttree righttree) .sz \n(pp .)l .sp 2 .ce 2 \fBFigure 4.2\fR Sample defstruct definition .hl .)z .sh 2 "Internal Data Structures" .pp In the process of creating possible query plans, the optimizer generates a great deal of information. To keep track of all this information, a more flexible structure of LISP is used \*- property lists. Every LISP object may have associated with it a list of characteristics, set by the user, called a property list. A major advantage of property lists is that one does not have to preallocate space for property slots, as required for defstructs. As a result, at any given time, every object may have an arbitrary number of properties associated with it. .pp Property lists are implemented using linked lists. Thus, access to information within a property list requires a linear search. The inefficiency of linear search is not a problem here because generally, the optimizer does not store any more than four or five items within a single property list, and as indicated in section 4.1, efficiency is not a primary consideration in this part of the system. Therefore, because property lists are simpler to work with, they are used extensively within the optimizer. .sh 2 "LISP-C Interface" .pp The foreign function interface in LISP is fairly easy to work with, provided a set of stringent rules are followed. For example, the standard way to pass structured information (e.g. a character string) from a C function is to return a pointer to the object. From here, the user can manipulate information within the object by referencing the pointer. This, however, will not work when LISP calls C because LISP cannot manipulate objects that C has allocated. It presents problems for the LISP garbage collector. .pp To work around this, C can return structured information by loading the data into variables that LISP has passed as parameters. Space for these return variables must be allocated by LISP prior to the C call. This is straightforward provided LISP knows the size of the returning object and can set aside a sufficient amount of memory. However, this is not always the case because tuples returned by C access method routines are variable length. .pp Fortunately, the optimizer never requires the contents of an entire tuple; on all occasions, it only needs a fixed set of attributes from within a single tuple. Therefore, rather than attempt to directly manipulate arbitrary tuples returned by access method routines, a layer written in C was built between the optimizer and the access method code. When the optimizer needs information from system catalogs, it calls some routine within this layer, which then calls access method routines to retrieve tuples. Desired information within these tuples are either returned explicitly as integers and floats, or they are passed back within variables allocated by LISP. .pp As an example, the optimizer may call a C routine, within the layer, called \*(lqretrieve_index\*(rq to retrieve information about a secondary index. In calling the routine, LISP passes a pointer to an integer array \*(lqindexinfo.\*(rq \*(lqRetrieve_index\*(rq then calls the access method routine \*(lqgetnext\*(rq until an appropriate tuple from the index catalog has been located. The index identifier, the number of pages in the index, and any other relevant information are extracted from the tuple and passed back to the optimizer in the array \*(lqindexinfo.\*(rq Consequently, variable length tuples are handled solely by C, resulting in a cleaner and simpler LISP-C interface. .sh 2 "An Evaluation of Using LISP" .pp Overall, writing the POSTGRES optimizer required about 6500 lines of LISP code and another 700 lines of C code. Having written the optimizer, using LISP was an excellent choice. There was a close match between our processing needs and the constructs and functions LISP provides. As a result, the programming effort was simplified. Had we used a language like C, we would have had to explicitly implement structures and routines equivalent to those LISP provides. .pp While writing the optimizer, it was also evident that other features of LISP were instrumental in simplifying code development. For instance, LISP allows you to either interpret or compile code written in the language. Naturally, compiled code is used in POSTGRES, but in developing the optimizer, the interpretive option was used. This significantly reduced development time because debugging was simpler and compilation time was eliminated. .pp LISP also supports dynamic allocation and implicit recollection of free space. The latter is implemented using garbage collection. As a result of these two properties, the optimizer can easily create objects of any type when needed, and LISP automatically handles memory management issues. .pp Last of all, LISP is a weakly typed language and because no single type is associated with variables in weakly typed languages, union types were implicit and variable declarations were unnecessary. This further resulted in simpler data structure definitions because declaration of field types was also unnecessary, as shown in figure 4.2. Another advantage of weakly typed languages is the absence of strict type checking. As a result, there is a certain degree of independence between the parameters a routine accepts and those that are actually passed. For example, if a routine accepts an identifier as a parameter but does not manipulate its actual value, then whether the identifier is an integer or string is not significant; choosing one or the other will not affect code within the routine. In many situations, this characteristic allowed us to make changes without modifying other relevant pieces of code. Changes could be made much more quickly as a result. So to briefly summarize, LISP was a simpler and much more flexible language to work with.